Tarasha Khurana

Position: PhD Student Institution: Carnegie Mellon University

Model the 4D world, not the data

Research Abstract:

The current generation of machine perception relies on a seemingly harmless assumption -- that we must visually understand only what can be observed in data. This visible perception does not include parts of the 4D world that may be occluded spatially, or temporal events that did not -- but could have -- transpired. However, even in their early years, humans use these spatiotemporal observations to build cognitive priors, (1) that explain their observations of the 4D world (e.g., seeing only a setting sun but understanding it is whole), and (2) for crucial decision making (e.g., enumerating all possible scenarios of crossing a busy road). Therefore, humans understand that sensory observations are only a partial representation of the “true” underlying 4D world which are, in reality, constrained by multiple physical processes. I, therefore, aim to model inductive biases in deep networks that reflect true physical processes (e.g., the camera image formation process). This shifts the focus from memorizing sensory observations (e.g., images) to actually modeling the physical world itself (e.g., its 3D nature). This view is agnostic to the semantics of the sensory observations (e.g., objects in the images) because this set is infinite. In the world of autonomous navigation, this semantics-agnostic view translates to safer navigation because now you do not need to name everything in the 4D world in order to avoid colliding into it (e.g., name every piece of debris fallen on road for a self-driving car).

Bio:

Tarasha Khurana is a Ph.D. student at The Robotics Institute, Carnegie Mellon University advised by Prof. Deva Ramanan. She works in computer vision and her research focus is broadly on spatiotemporal 4D scene understanding, with emphasis on building smart 4D foundation models. Previously, during her Masters at CMU, she worked on estimating and exploiting 3D scene geometry from single images, to reason about occlusions and densify sparse depth input. Before her masters, she was at the University of Delhi for her bachelors’. Over the years, she has worked at Google Research, Argo AI, Staqu Technologies, and on various projects with the Government of India. Tarasha’s work on detection in extreme occlusions at ICCV 2021 was covered by NBC Universal. She has given invited talks at Bosch, and Waabi AI, and has frequently organised popular workshops and challenges at ECCV and CVPR, two of the most prestigious conferences in her field.